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Abstract.  Recent  advances  in  the  rational  design  of  d-ug  molecules  based  on  a  graph-theoretical 
approach  are  briefly  reviewed.  Graph  theory  has  not  been  widely  recognized  to  date  as  an  effec¬ 
tive  alternative  to  the  empirical  procedures  currently  prevailing  in  the  development  of  new  drugs. 
Moreover,  the  problems  confronting  researchers  in  this  field  are  daunting  in  their  great  complexity. 
We  advocate  here  a  novel  yet  simple  mathematical  formalism  which  opens  up  a  promising  new 
avenue  of  research.  After  outlining  the  fundamental  premises  of  our  method,  we  exemplify  it 
by  discussing  the  characterization,  comparison,  and  quantification  of  similarity  among  individual 
molecules.  It  is  indicated  how  the  essential  bioactive  component  in  molecules  of  -compounds 
displaying  similar  pharmacological  behavior  may  be  identified.  We  conclude  by  describing  the 
pharmacological  classification  of  18  compounds,  all  of  which  are  structurally  similar  but  which 
exhibit  several  differing  types  of  bioactivity. 

Keywords.  Drug  design;  graph  theory;  optimization  techniques. 


INTRODUCTION 

The  history  of  d- ug  development  abowds  in  examples 
of  major  discoveries  being  made  either  sorendipitously 
or  as  a  result  of  following  totally  erroneous  procedures 
(Burger,  1983).  In  spite  of  this  circumstance,  however, 
it  was  recognized  very  early  on  that  the  bioactivity 
of  drug  molecules  was  dependent  upon  the  presence 
of  special  structural  features  in  such  molecules.  It 
was  pointed  out  by  Crum  Brown  and  Fraser  (1868), 
for  instance,  that  the  quaternary  ammonium  grotp 
was  essential  for  the  blocking  activity  of  curare-type 
(bugs.  Exactly  one  hundred  yeers  ago,  Paul  Ehrlich 
(1885),  the .  founder  of  modem  medcinal  chemistry, 
elucidated  the  role  played  by  enzymes  in  living  systems. 
He  thereby  simplified  the  problem  of  dug  interaction 
from  one  involving  the  study  of  cellular  complexity 
to  one  involving  complexity  at  no  more  than  the  mole¬ 
cular  .level.  Ehrlich's  work  laid  the  foundation  for 
our  current  theories  of  drug  action,  drug  metabolism, 
and  drug  resistance. 

Unfortunately,  even  today,  very  little  is  known  about 
the  mechanism  of  drug  action  or  the  underlying  dynamics. 


The  principal  reason  for  this  lacuna  is  a  lack  of  knowledge 
of  the  structure  of  the  relevant  enzymes  and  an  absence 
of  detailed  descriptions  of  their  active  sites.  By  contrast, 
most  dugs  can  be  viewed  as  small  molecules  (with 
a  few  notable  exceptions)  and  are  thus  much  more 
accessible  to  study.  The  fundamental  problem  in 
bioactivity  studies  therefore  resolves  itself  into  one 
of  investigating  the  interaction  of  a  relatively  small, 
well  characterized  molecule  with  an  unknown  large 
protein  molecule.  Clearly,  this  represents  an  exceedingly 
difficult  problem  given  the  current  level  of  our  know¬ 
ledge.  Moreover,  until  our  understanding  of  the  partici¬ 
pating  protein  and  enzyme  molecules  approaches  that 
existing  for  small  molecules,  and  until  our  knowledge 
of  all  the  intermediate  steps  which  occur  in  an  organism 
after  administration  of  the  dug  becomes  very  detailed, 
such  difficulties  are  likely  to  remain  with  us.  This 
observation  necessarily  implies  that  we  are  very  far 
from  a  situation  where  the  use  of  any  kind  of  rigorous 
theoretical  technique  could  be  comtemplated. 

In  coming  to  terms  with  the  present  state  of  affairs, 
pragmatism  would  seem  to  point  in  the  direction  of 
sacrificing  our  curiosity  on  how  the  whole  process  evolves 


THE  FUNDAMENTAL  POSTULATE 


and  focusing  instead  on  what  is  now  within  our  reach. 
If  we  adopt  a  typical  systems  analysis  approach  (White 
and  Tauber,  1969),  we  may  probe  the  system  by  means 
of  an  appropriate  input  (d-ug)  and  then  examine  the 
resultant  output  (pharmacological  activity). 
Approximate  schemes,  empirical  rules,  statistical 
methods,  and  mathematical  modelling  might  thus  appear 
as  the  only  reasonable  routes  to  encompassing  the 
vast  amount  of  data  on  tk-ugs  which  have  accumu¬ 
lated  over  the  years.  The  situation  confronting  us, 
however,  is  perhaps  not  as  bleak  as  It  may  at  first 
seem,  for  there  is  a  good  deal  of  evidence  which 
indicates  that  apparently  similar  compounds  exhibit 
closely  similar  pharmacological  activities. 

One  of  the  first  to  recognize  the  relationship  between 
the  structure  and  activity  of  drugs  was  Emil  Fischer 
(1894)  in  a  paper  entitled  'Influence  of  configuration 
on  the  action  of  enzymes.*  The  basic  model  that  he 
put  forward  assumed  that  enzymes  have  recognition 
sites,  i.e.  receptor  locations,  that  are  highly  specific 
structurally.  Binding  to  a  host,  i.e.  a  (bug,  would  be 
possible  only  if  essential  structural  fragments  in  the 
c*-ug  molecule  match  up  precisely  with  those  at  the 
receptor  site.  In  more  informal  terms,  this  matching 
can  be  described  in  terms  of  a  'lock  and  key*  analogy, 
with  the  thug  playing  the  role  of  the  key.  The  current 
status  of  medicinal  chemistry  can  be  summed  up  by 
stating  that  the  available  'keys'  are  being  employed 
to  probe  unknown  'locks'  with  a  view  to  constructing 
improved  'keys'  that  will  better  fit  the  'locks'. 

In  mathematical  parlance,  the  above  represents  an 
example  of  a  reconstruction  problem:  by  collecting 
a  fair  number  of  responses,  one  tries  to  determine 
the  optimal  input.  By  inversion  of  its  own  connectivity, 
an  optimal  key  molecule  would  certainly  be  able  to 
provide  valuable  information  about  the  structure  of 
its  receptor.  In  practice,  once  a  reliable  lead  compound, 
i.e.  a  structure  that  triggers  a  useful  response,  has 
been  identified,  the  next  problem  of  selecting  structures 
with  enhanced  biological  activity  would  not  be  soluble 
without  some  guidelines  as  to  the  method  of  picking 
out  the  small  number  of  highly  active  molecules  from 
the  usually  enormous  number  of  possible  candidates. 
There  is  an  astronomical  number  of  combinatorial 
possibilities  associated  with  even  a  modest  number 
of  substitution  sites  on  a  molecule  and  (say)  a  dozen 
or  more  potential  substituents.  Thus,  starting  from 
a  given  lead  molecule,  the  essential  task  becomes 
one  of  devising  some  scheme  whereby  those  few  candi¬ 
date  structures  which  can  function  even  more  effect¬ 
ively  as  <*~ugs  than  the  lead  can  be  recognized. 


Currently,  two  fundamentally  different  philosophies 
underlie  the  various  approaches  to  the  rational  design 
of  drugs.  The  first  involves  considering  a  large  data 
set  of  compounds  and  reducing  the  size  of  the  set 
by  means  of  a  number  of  empirical  schemes,  all  of 
which  are  baaed  essentially  on  statistical  analysis. 
This  reduces  the  problem  to  one  of  lower  dimension 
and  normally  gives  an  indication  of  which  parameters 
are  critical.  Once  established,  these  parameters  can 
be  employed  in  the  prediction  of  novel  candidate  drug 
molecules.  Representative  methods  based  on  this 
school  of  thought  include  pattern  recognition  (Stuper 
et  al.,  1979)  and  regression  analysis  (Hansch,  1969). 
The  second  philosophy  entails  considering  a  small  data 
set  of  compounds,  the  aim  now  being  recognition  of 
the  degree  of  similarity  between  compounds  of  similar 
pharmacologic  or  therapeutic  value.  Exclusive  use 
is  made  here  of  structural  parameters  for  the  description 
of  the  drug  molecules,  with  the  emphasis  falling  on 
the  mathematical  properties  of  the  structures  involved. 
Comparison  of  structures  having  similar  mathematical 
properties  is  undertaken  on  the  assumption  that  such 
structures  will  also  display  similar  physical,  chemical, 
and  biological  properties. 

The  fundamental  basis  of  the  second  school  of  thought 
may  be  expressed  in  terms  of  the  following  postulate 
(Randid  1985a): 

POSTULATE:  Structures  which  display  substantial 
similarity  in  their  mathematical  properties  will  also 
di^iay  considerable  similarity  In  their  physical, 
chemical,  and  biological  properties. 

This  postulate  has  a  number  of  very  important  impli¬ 
cations,  etch  of  which  we  now  outline  and  assess  in 
some  dstsil: 

(i)  The  various  natural  properties  of  chemical  species 
may  be  characterized  in  purely  mathematical 
terms.  This  is  wellknown  to  be  the  case,  witness 
the  widespread  use  of  topological  indices  (Bonchev, 
1983)  in  the  description  of  many  natural  phenom¬ 
ena.  It  was  first  suggested  by  Rouvray  (1973) 
that  topological  Indices  might  be  used  as  mathe¬ 
matical  descriptors  for  candidate  molecules 
in  di jg  design  studies.  The  feasibility  of  this 
type  of  approach  has  been  amply  demonstrated 
in  recent  years  (Kier  and  Hall,  1976); 

(ii)  The  natural  properties  of  chemical  species  are 
merely  reflections  of  inherent  mathematical 
properties  of  the  structures  concerned.  According 
to  this  view,  chemical  species  characterized 
by  similar  mathematical  descriptors  will  be 


possessed  of  similar  physical,  chemical,  and 
biological  properties.  This  statement  is  equivalent 
to  saying  that,  if  the  mathematical  descriptors 
of  two  structures  are  closely  similar,  the  structures 
concerned  will  behave  as  isoteres,  i.e.  molecules 
which  have  related  physicochemical  properties 
and  which  exhibit  broadly  similar  bioactivity 
(Langmuir,  1919;  Thcmber,  1979).  Such  a 
formulation  places  the  focus  of  interest  on  the 
mathematical  properties  of  structures  and 
indicates  that  the  natural  properties  may  be 
compered  and  predicted  by  assessing  the 
mathematical  features  of  the  structures  concerned; 

(iii)  In  going  from  structure  to  structure  among  mole¬ 
cules  which  are  closely  similar,  it  is  postulated 
that  there  exist  a  rough  continuum  in  the  physical, 
chemical,  and  biological  properties  of  the  various 
species.  Although  it  can  never  be  strictly  accurate 
to  refer  to  a  continuum  when  reference  is  made 
to  discrete  objects,  i.e.  molecules,  it  is  our  con¬ 
tention  that  very  small  changes  in  the  neighborhood 
relations  within  the  set  of  molecules  wilt  result 
in  only  minor  changes  in  their  natural  properties. 
Thus,  provided  an  appropriate  set  of  structures 
is  chosen,  a  more  or  less  continuous  range  of 
properties  can  be  generated  without  any  significant 
gaps  and  with  no  abrupt  changes; 

(iv)  Based  on  the  above  statement,  it  follows  that, 
by  making  a  suitable  selection  of  structures 
and  their  substituents,  any  desired  range  of  natural 
properties  can  be  realized  for  a  specific  set 
of  structures.  In  the  light  of  the  foregoing,  the 
initial  selection  process  would  have  to  entail 
characterization  of  the  mathematical  nature 
of  the  candidate  structures  and  the  collection 
together  of  those  structures  which  display  only 
slight  differences  in  terms  of  their  mathematical 
descriptors.  The  use  of  mathematical  criteria 

'  for  the  purpose  of  clustering  structures  around 
some  defined  natural  property  will  form  the 
subject  matter  of  most  of  the  rest  of  this  presen¬ 
tation. 

Although  our  postulate  serves  as  an  effective  paradigm 
in  specifying  how  chemical  structures  -may  be  clustered 
around  some  natural  property  of  interest,  it  does  not 
reveal  how  the  mathematical  characteristics  will  be 
manifested  in  terms  of  the  various  natural  properties 
of  the  set  of  chemical  compounds  investigated.  In 
this  respect,  it  is  quite  unlike  the  axioms  of  quantum 
theory  or  the  laws  of  classical  physics.  Moreover, 
it  should  be  mentioned  that  the  postulate  does  not 
necessarily  apply  to  individual  compounds  which  are 
regarded  as  forming  a  continuum.  In  appropriate 
contexts,  the  postulate  can  refer  to  composite  systems 


such  as  a  d’ug-enzyme  pair.  A  major  criticism  to 
our  type  of  approach  has  been  that  chiral  structures 
have  the  same  mathematical  properties  (since  they 
are  identical  in  all  respects  apart  from  irrelevant 
mirror  reflections)  yet  display  dramatically  different 
biological  activities.  Such  criticism  is  invalid  because 
the  biological  property  under  consideration  in  this 
situation  is  not  that  arising  from  a  single,  isolated 
structure  but  rather  one  from  a  drug-receptor  pair. 
A  valid  comparison  would  study  the  mathematical 
properties  of  the  chug-receptor  pairs  for  both  antipodal 
structures.  This  sort  of  comparison  would  reveal  that 
the  two  systems  were  not  alike  at  all  and  were  in  fact 
quite  different. 

OUTLINE  OF  GRAPH-THEORETICAL  SCHEMES 

In  order  to  proceed  with  our  treatment,  it  will  be  neces¬ 
sary  to  accomplish  three  tasks,  viz  (i)  to  represent 
all  the  structures  of  interest  in  mathematical  terms, 
i.e.  as  chemical  graphs;  (ii)  to  prescribe  some  compar¬ 
ability  test  that  will  indicate  how  much  two  given 
structures  differ;  and  (iii)  to  recognize  the  significant 
components  in  the  molecules  considered.  The  first 
two  of  these  tasks  have  been  add-essed  in  many  publi¬ 
cations  discussing  the  graph-theoretical  approach 
to  Structure-activity  studies  (Randid  and  Wilkins,  1979a, 
1979b;  Wilkins  and  Randid.  1980;  Wilkins  et  al.,  1981; 
Randi&  1985a;  Jerman-Blazid  et  al„  1985),  and  thus 
need  not  be  further  elaborated  here.  For  our  purposes 
we  shall  represent  chemical  compounds  by  their  mole¬ 
cular  graphs  with  the  nonessential  hydrogen  atoms 
suppressed  in  the  customary  fashion  (Trine  jstid.  1983), 
though  we  shall  also  discuss  the  representation  of 
structures  by  appropriately  weighted  path  numbers. 
Our  main  focus  of  attention,  however,  will  be  on  task 
(iii)  and  the  means  of  identifying  the  essential  compon¬ 
ents  within  a  set  of  structurally  related  compounds 
displaying  differing  pharmacological  activities. 
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Let  us  first  consider  the  compound  shown  in  Figure 
1  and  assign  an  arbitrary  numbering  to  the  atoms  therein. 
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This  particular  compound  is  closely  related  to  all  of 
the  compounds  depicted  in  Figure  2.  The  simplified 
molecular  graphs  we  are  using  here  do  not  differentiate 
between  aromatic  and  aliphatic  C-C  bonds,  and  neither 
do  they  distinguish  between  single  C-C  and  C-N  bonds. 
Our  ultimate  interest  concerns  the  set  of  compounds 
shown  in  Figure  2,  all  of  which  possess  certain  identical 
structural  features:  a  phenyl  ring  and  a  nitrogen  atom 
removed  from  the  ring  by  three  bonds.  In  the  present 
context,  differentiation  between  bond  types  turns  out 
to  be  of  no  great  consequence;  in  fact  it  was  recently 
demonstrated  (Crossman  et  a I.,  1985)  that  character¬ 
ization  of  structures  by  means  of  weighted  paths  (where 
heteroatoms  were  given  differing  weights)  was  rather 
insensitive  to  the  actual  choice  of  weights. 

Our  characterization  of  the  compounds  illustrated  above 
wilt  be  based  on  the  counts  of  paths  of  different  lengths. 
A  path  of  length  it  will  represent  a  fragment  containing 
k  consecutive  bands,  i.e.  a  chain  of  length  k.  By  conven¬ 
tion,  paths  of  length  zero  represent  atoms,  paths  of 
length  one  count  the  number  of  bonds,  paths  of  length 
two  count  the  number  of  pairs  of  consecutive  bonds, 
and  so  on.  In  Table  1  we  present  the  counts  for  paths 
of  increasing  length  for  each  of  the  atoms  in  the  com¬ 
pound  depicted  in  Figure  1.  These  counts  were  arrived 
at  by  making  use  of  the  ALLPATH  program  (Randid 
et  al.,  1979).  For  compounds  having  no  more  than  a 
single  ring  and  a  few  atoms,  carrying  out  the  counts 
is  not  particularly  onerous,  yet,  even  for  bicyclic  systems, 
it  becomes  impractical  to  perform  the  counts  by  hand. 
The  last  row  in  Table  1,  having  the  entries  9  9  10  11 
12  12  6  4  2,  gives  the  path  counts  for  the  molecule 
as  a  whole.  These  counts  can  be  readily  derived  from 
the  data  on  the  individual  atoms  if  it  is  .  remembered 
that  all  the  paths  (except  those  of  zero  length)  have 
been  counted  twice  —  once  for  each  end  atom.  The 
molecule  in  Figure  1  thus  has  9  atoms,  9  bonds,  10  adja¬ 
cent  pair  bonds,  11  sets  of  three  consecutive  bonds, 
and  so  on. 

The  advantage  of  using  path  numbers,  as  opposed  to 
molecular  fragments,  such  as  bonds  or  small  atomic 
groups,  is  that  the  path  numbers  retain  some  information 
on  the  nonlocal  connectivity  within  the  structure.  How¬ 
ever,  it  is  evident  that  the  number  of  paths  of  inter¬ 
mediate  length  dominate  in  the  path  counts,  a  fact 
which  may  result  in  similarities  among  compounds  due 
to  local  characteristics  being  obscured.  From  studies 
on  isomeric  variations  in  the  physicochemical  properties 
of  species  (Randid  and  Wilkins,  1979c,  1979d;  Randid 
and  Wilkins,  1980;  Randidand  Trinajstid,  1982),  it  has 
been  established  that  shorter  paths,  especially  those 
of  lengths  two  and  three,  play  a  crucial  role.  It  would 
thus  appear  desirable  to  introduce  weighting  of  the 
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paths  in  order  that  the  dominant  role  of  the  more 
abundant  paths  of  intermediate  length  can  be 
counteracted,  and  the  role  of  the  shorter  paths  given 
greater  prominence.  As  weighting  of  paths  based 
on  a  differentiation  of  bond  types  has  been  found 
effective  for  such  purposes  (Menon  and  Cammarata, 
1977,  Randid,  1984a;  Randid,  1985a),  we  shall  adopt 
this  approach  here. 


To  carry  out  the  weighting,  each  bond  is  classified  as 
being  of  (m,n)  type,  where  m  and  n  are  the  numbers 
of  edges  emanating  from  each  of  the  terminal  vertices 
of  the  bond  in  question.  For  all  bond  types  <m,n),  a 
weight  of  <m  x  n)'i  is  assigned  to  each  bond,  following 
the  same  procedure  adopted  in  computing  the  connectivity 
indices  of  molecules  (Randid  1975).  This  weighting 
procedure  many  be  used  in  conjunction  with  the  widely 
available  ALLPATH  program  (Randid  et  at.,  1980) 
provided  the  weights  are  entered  as  input.  Alternatively, 
a  subroutine  may  be  added  to  the  existing  program 
to  automatically  introduce  weightings  in  the  counting 
process  (Randid,  1985b).  In  Table  2  are  listed  the  counts 
for  the  weighted  paths  of  the  compound  depicted  in 
Figure  1.  The  results  represent  the  printed  output  of 
a  modified  ALLPATH  program. 

In  addition  to  the  path  numbers,  i.e.  the  numbers  of 
paths  of  different  length,  the  output  also  gives  for  each 
atom  the  total  of  all  the  paths  pertaining  to  that  atom. 
Thus,  for  atom  1  this  total  is  3.011  whereas  for  atom 
2  it  is  only  2.989,  and  so  on.  It  appears  that  these  'atomic' 
numbers  are  able  to  differentiate  between  atomic 
environments;  they  may  therefore  be  referred  to  as 
atomic  identification  (ID)  numbers.  The  last  line  of 
the  output,  reproduced  here  with  the  numbers  truncated 
to  three  decimal  places: 

9  4.431  2.215  1.098  0.574  0.276  0.071 

0.025  0.007 

represents  the  (weighted)  path  counts  for  the  molecule 
as  a  whole.  The  total  number  of  paths,  17.7005,  has 
been  termed  the  molecular  ID  (identification  number), 
and  has  been  shown  to  be  a  highly  discriminating  (Randid, 
1984a)  though  not  unique  (Szymanski  et  al.,  1985)  index. 

COMPARISON  OF  DIFFERENT  STRUCTURES 

For  the  molecule  illustrated  in  Figure  1,  Table  2  provides 
a  set  of  graph  invariants  that  may  be  used  in  comparing 
similar  data  on  a  variety  of  other  structures.  A  sequence, 
such  as  the  above  list  of  paths  of  different  lengths, 
or  a  set  of  numbers,  such  as  the  list  of  all  'atomic' 
numbers,  clearly  offers  a  broader  basis  for  the  comparison 
of  structures  than  (say)  a  single  topological  Index,  such 
as  the  connectivity  index  originally  introduced  to  discuss 
the  branching  in  alkane  molecules  and  variations  among 
the  physicochemical  properties  of  isomeric  species 
(Randid,  1975).  As  will  be  evident,  even  the  use  of 
a  single  number  as  a  descriptor,  e.g.  a  partial  sum  of 
selected  'atomic'  numbers,  can  yield  extremely  useful 
information  from  the  comparisons  between  structures. 
To  those  not  especially  well  versed  in  chemical  graph 
theory,  it  may  come  as  something  of  a  surprise  that 


a  single  number  is  able  to  capture  so  much  of  the 
essential  structural  information  associated  with  chemical 
species,  though  this  observation  has  been  amply 
corroborated  in  the  manifold  applications  of  the 
connectivity  index.  Such  numbers,  which  to  the 

I 

uninitiated  may  appear  to  be  ad  hoc  in  origin,  are 
in  fact  based  upon  well-defined  and  important  structural 
invariants. 

The  information  presented  in  Table  2  can  be  used  in 
several  different  ways.  The  path  numbers  for  each 
compound  may  be  viewed  as  the  components  of  a  vector 
and  the  degree  of  similarity  existing  between  different 
vectors  then  established.  The  similarity  can  be  defined 
in  terms  of  the  Euclidean  distance  between  the  position 
vectors  in  n-dimensional  space.  This  general  type 
of  analysis  has  already  been  applied  to  the  dopamines, 
benzomorphans,  barbiturates,  and  aminotetralins  (Randid 
and  Wilkins,  1979c;  Randid  and  Wilkins,  1979d;  Randid 
and  Wilkins,  1980;  Randidand  Trinajstid,  1982).  Alterna¬ 
tively,  structures  may  be  represented  by  sets  comprised 
of  the  relevant  atomic  path  sums,  where  the  summation 
is  restricted  to  selected  atoms  only,  as  illustrated 
in  the  search  for  optimal  antitumor  drugs  (Randid, 

1985a).  Use  of  the  molecular  ID  numbers  for  the  purpose 
of  clustering  compounds  together  can  be  made  only 
on  the  basis  of  similarities  existing  among  the  individual 
ID  values.  In  the  cases  of  several  therapeutically 
valuable  antihistamines,  anticholinergics,  antipsychotics, 
antidepressants,  analgesics,  and  antiparkinsonians, 
however,  surprisingly  good  classifications  based  solely 
on  this  single  structural  parameter  have  been  obtained 
(Randid.  1984b). 

As  will  be  evident  from  Table  2,  ID  numbers  are  size- 
dependent.  For  the  compounds  we  have  considered 
here,  each  atom  contributes  around  2.25  to  3.00  to 
the  ID  number.  It  seems  quite  likely  that  such  'size' 
effects  may  obscure  some  of  the  finer  structural  differ¬ 
ences  existing  among  the  compounds  illustrated  in 
Figure  2.  The  effect  may  be  especially  pronoucned 
here  because  all  the  molecules  concerned  are  relatively 
small,  i.e.  they  contain  no  more  than  10-15  atoms 
each,  not  counting  the  suppressed  hyd-ogen  atoms. 
In  the  following  section,  we  shall  select  a  fragment 
present  in  all  the  compounds  of  Figure  2.  Comparison 
of  the  compounds  will  be  based  solely  upon  the 
characteristics  of  the  atoms  common  to  all  the 
structures  considered. 

CLUSTERING  OF  THE  THERAPEUTICALLY 
■  RELATED  SPECIES 

The  compounds  represented  in  Figure  2,  all  of  which 
are  therapeutically  very  efficacious,  form  a  subset 
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TABLE  3.  The  atomic  ID  number  for  the  nine  atoms  common  to  all  18  structures  of  Fig.  2. 


Drug 

Atom  positions: 

1  2 

3 

4 

1 

3.016 

2.992 

2.985 

2.992 

2 

3.015 

2.992 

2.985 

2.992 

3 

3.025 

2.997 

2.989 

2.997 

4 

3.017 

2.993 

2.9986 

2.993 

S 

3.016 

2.980 

3.198 

2.980 

6 

3.001 

3.192 

3.185 

2.977 

7 

3.004 

3.203 

2.971 

2.990 

8 

3.010 

3.206 

2.973 

2.993 

9 

3.201 

2.995 

2,988 

2.995 

10 

3.023 

2.996 

2.989 

2.9% 

11 

3.009 

3.196 

3.188 

2.981 

12 

3.350 

3.028 

3.013 

3.019 

13 

3.009 

3.196 

3.188 

2.981 

14 

3.009 

3.196 

3.188 

2.981 

15 

3.008 

3.1% 

3.188 

2.981 

16 

3.362 

3.045 

3.040 

3.341 

17 

3.014 

3.198 

3.190 

2.984 

18 

2.983 

2.973 

2.970 

2.973 

of  a  collection  of  compounds  investigated  by  Menon 
and  Cammarata  <1977)  using  pattern  recognition  tech¬ 
niques.  From  their  collection  of  almost  40  compounds, 
we  have  selected  18  compounds  whose  molecules  contain 
no  cycles  other  than  a  single  phenyl  group,  no  chlorine 
atoms  as  substituents,  and  no  quar ternary  nitrogen 
atoms.  All  of  the  selected  compounds  are  closely  related 
structurally;  apart  from  having  a  phenyl  ring,  they 
all  have  a  nitrogen  atom  three  bonds  removed  from 
this  ring.  They  do  differ,  however,  in  the  number,  type, 
and  position  of  the  various  substituents  they  contain, 
namely  the  hydroxyl  group,  the  methyl  or  ethyl  groups, 
and  occasionally  the  carbonyl  group.  In  Table  3  a  partial 
path  characterization  of  these  compounds  is  presented, 
with  only  the  atomic  path  numbers  appearing  for  the 
nine  atoms  common  to  all  of  the  compounds.  Inspection 
of  Figure  2  reveals  that  the  structure  we  have  depicted 
in  Figure  1  is  the  largest  fragment  common  to  all  the 
18  compounds. 

The  partial  sums  of  the  atomic  path  numbers  for  the 
nine  common  atoms  are  reported  in  Table  4.  The  nine 
atoms  have  now  been  partitioned  into  two  groups;  the 
six  atoms  constituting  the  phenyl  ring  are  considered 
separately  (for  reasons  which  will  become  apparent 
later).  The  remaining  entries  in  Table  4  are  for  the 
three  atoms  forming  the  side  chain  (including  the 
nitrogen);  the  totals  for  the  nine-atom  fragment  are 
also  given  in  each  case.  These  latter  totals  we  shall 
refer  to  as  the  fragment  ID  numbers.  Analysis  of  the 
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numbers 

indicates  that  tne  'size'  effect 

mentioned 

above  has 

now  been 

eliminated. 

Use  of 

the  fragment  IDs  to 

order  the 

compounds. 

however. 

leads  to  the  disappointing  result  that  such  ordering 
produces  no  significant  pharmacological  classification 
of  the  compounds. 


FIG.  3.  The  fragments  thought  to  be  essential  for 
the  activity  of  (a)  the  morphirts,  Q>)  neuroleptics. 
(d  mutagenic  nitroarenes. 


The  nine-atom  fragment  is  comparable  in  size  with 
a  number  of  other  groupings  identified  as  performing 
an  essential  pharmacophoric  role  in  various  drug 


a-Agonist  (16) 


TABLE  4.  Partial  ana  of  atomic  path  numbers 
for  the  ring  and  other  atoms  in  the  fragment  face  Fig.  1).  19,2 


Drug 

Ring  ID 

Side 

Frag¬ 

Chain 

ment 

ID 

ID 

* 

1  Amphetamine 

18.353 

8.321 

26.675 

2  Phenylpropanolamine 

18.347 

8.503 

26.850 

19.0 

3  Metamphe famine 

18.404 

8.788 

27.193 

4  Phentermine 

18.357 

8.527 

26.880 

5  Hydoxyamphetamine 

18.548 

8.326 

26.875 

6  Levarterenol 

18.719 

8.253 

26.973 

7  Metaramirtol 

18.541 

8.506 

27.047 

8  Phenylephrine 

18.573 

8.751 

27.324 

— 

6-Agonist  (12) 

9  Ephedine 

18.382 

8.965 

27.347 

10  Mephentermine 

18.396 

8.975 

27.371 

18.8 

— 

6-Agonist  (17) 

11  Epinephrine 

18.767 

8.753 

27.521 

a-Agonist  (11) 

g-Agonist  (11) 

12  Methoxyphenamine 

18.870 

8.814 

27.685 

a-Agonist  (6) 

g-Agonist  (13) 

13  Ethyl  norepinephrine 

18.768 

8.736 

27.505 

— 

14  Levodopa 

18.766 

8.446 

27.213 

15  Methyl dopa 

18.764 

8.629 

27.393 

16  Metoxamine 

19.274 

8.53$ 

27.809 

18.6 

17  Isoproterenol 

18.769 

9.047 

27.843 

e-Agonist  (8) 

18  Diethylpropion 

18.155 

9.376 

27.532 

— 

a-Agonist  (5) 
e-Agonist  (7) 

CNS  agent  (IS) 
CNS  agent  (14) 


molecules.  For  instance,  the  empirical  'morphine  rule’ 

fragment  (Lednicer  and  Mitscher,  1977),  the  fundamental  -|8.4 

structure  proposed  for  neuroleptic  action  (Janssen,  ™  e-Agonist  (9) 

1964),  and  the  significant  fragment  in  the  mutagenic  __ 

nitroarenes  (Klopman  and  Rosenkranz,  1984)  are  all  e-Agonist  (2) 

of  a  similar  size  and  each  is  claimed  to  be  specific. 

The  three  fragments  are  illustrated  in  Figure  3.  In 
our  case,  the  nine-atom  group  we  consider  is  clearly 
pharmacologically  active,  though  its  action  is  nonspecific. 

What  is  required  at  this  point  is  a  finer  differentiation 
among  the  18  compounds  under  consideration.  |~ 


CNS  stimulant  (3) 
CNS  stimulant  (10) 
CNS  stimulant  (9) 
CNS  stimulant  (4) 
CNS  stimulant  (1) 
CNS  stimulant  (2) 


CNS  stimulant  118) 


In  Figure  4  a  histogram  is  presented  based  only  upon 
the  ring  ID  values,  i.e.  the  values  of  the  atomic  path 
sums  for  the  six  atoms  constituting  the  phenly  ring 
in  the  18  compounds  of  interest.  Rather  surprisingly, 
there  is  now  a  very  evident  clustering  of  all  the  central 
nervous  system  (CNS)  simulants  (which  have  lower  values 
of  the  ring  10),  and  similar  clusterings  for  the  6 -agonists 
and  the  CNS  agents.  The  latter  group,  which  clusters 
in  the  central  region  of  Figure  4,  has  too  few  compounds 
within  it  to  give  any  great  statistical  significance 
to  this  particular  finding.  Moreover,  by  contrast, 
there  is  a  wide  scatter  for  the  o-agonists  over  the 
whole  range  of  ring  ID  values. 

These  observations,  which  are  highly  interesting,  might 
have  escaped  attention  altogether  If  only  a  visual 
inspection  of  the  structures  had  been  made.  Clustering 


FIG.  4.  Classification  of  the  bioactivitv  of  the 
18  compounds  considered  based  on  their  ring  ID  numbers. 

of  all  the  CNS  stimulants  within  the  range  of  ring  ID 
values  lying  between  18.3S  -  18.40,  which  represents 
a  small  interval  compared  to  the  full  range  of  possible 
ring  ID  values  (from  around  18.00  to  19.25)  for  the 
compounds  under  study,  indicates  that  rings  lying  within 
this  narrow  range  contain  a  specific  structural  com¬ 
ponent  essential  for  that  particular  type  of  pharmaco¬ 
logic  activity.  This  interval  relates,  of  course,  only 
to  unsubstituted  phenyl  rings,  and  careful  inspection 
of  the  molecular  diagrams  might  have  revealed  that 
such  rings  are  essential  for  CNS  stimulants.  In  the 
case  of  the  6-agonists,  a  phenyl  ring  with  two  substit¬ 
uent  hydroxyl  groups  appears  to  be  essential,  and  the 
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same  seems  to  apply  in  the  CNS  agents.  These  can 
certainly  be  interpreted  as  positive  results,  although 
the  scatter  of  the  a-agonist  ring  ID  values  over  the 
whole  range  of  possible  values  must  be  seen  as  a  negative 
result.  The  finding  that  for  a-agonistic-type  activity 
substitution  (by  hydroxyl  groups)  of  the  phenyl  ring 
may  occur  is  without  special  significance. 

CONCLUDING  REMARKS 

In  this  presentation  only  one  particular  aspect  of  the 
graph-theoretical  approach  to  quantitative  structure- 
activity  relationships  has  been  examined.  After  visually 
identifying  a  common  nine-atom  fragment  among  a 
group  of  therapeutically  valuable  d-ugs,  attention  was 
focused  on  one  critical  component  of  the  fragment. 
This  component  was  a  ring  which  played  a  major  role 
in  discriminating  between  structures  for  pharmacological 
classification  purposes.  Thus,  it  is  important  to  recognize 
that  not  only  fragments  may  be  responsible  for  pharmaco¬ 
logical  action,  but  that  such  fragments  may  need  to 
be  further  subdivided  in  order  to  obtain  a  good  correlation 
between  a  given  structure  and  its  function.  Even  a 
negative  result,  such  as  the  discovery  that  the  behavior 
of  a  fragment  is  insensitive  to  selective  substitution; 
is  of  considerable  interest  in  drug  design  studies.  Par 
one  thing  it  suggests  that  the  least  expensive  derivative 
may  be  used  for  any  substitution  which  is  irrelevant, 
provided  that  other  factors,  such  as  toxicity  and  dosage, 
remain  unchanged. 

If  the  compounds  listed  in  Figure  2  are  regarded  as 
lead  compounds,  the  analysis  presented  here  can  serve 
to  indicate  both  productive  and  unfruitful  approaches 
to  the  design  of  enhanced  drugs.  In  the  case  of  CNS 
stimulants,  for  instance,  it  is  clear  that  it  would  be 
undesirable  to  attempt  to  substitute  the  phenyl  ring, 
whereas  for  a -agonists  this  would  be  an  allowed  possi¬ 
bility.  The  actual  direction  adopted  will,  of  course, 
depend  very  heavily  on  which  particular  standards  are 
recommended  as  optimal.  Compounds  which  appear 
most  promising  would  in  general  differ  least  in  the 
essential  fragment,  that  is  to  say  the  mathematical 
characterization  of  the  fragments  should  differ  least 
from  that  of  the  lead  molecule.  Although  it  is  not 
unreasonable  to  adopt  the  approach  pursued  in  several 
similar  studies  reported  previously  (Men  on  and 
Cammarata,  1977;  Trinajstifr,  1983;  Randid,  1985a), 
one  has  now  gained  important  additional  insights.  It 
is  much  better  known  which  part  of  the  overall  molecular 
characterization  is  most  crucial.  By  clustering  together 
structures  that  are  most  similar  in  their  more  significant 
structural  details,  some  uncertainties  in  the  search 
for  optimal  drugs  can  certainly  be  eliminated. 
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